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;>  The  goal  of  clustering  is  the  partitioning  of  a given  set  of  objects  into  subsets  called  clusters  in 
such  a way  that  the  objects  in  a cluster  are  similar  to  one  another  and  that  objects  in  different 
clusters  are  dissimilar.  Clustering  may  help  In  getting  a more  or  less  direct  understanding  of  the 
relationships  among  the  objects,  and  it  may  be  useful  as  a first  step  in  pattern  recognition.  Some 
possible  applications  are  automatic  phoneme  recognition,  data  base  management  systems,  personnel 
classification,  detection  of  errors  in  files  and  computer  security. 
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APPLICATION  OF  AUTOMATIC  CLUSTERING 
TO  EMITTER  IDENTIFICATION 

1 . INTRODUCTION 


Research  In  and  applications  of  artificial  Intelligence  [4]  are 
the  main  thrusts  of  the  Computer  Science  Laboratory,  Cotsaunlcatlons 
Sciences  Division,  Naval  Research  Laboratory.  At  present,  we  are  work- 
ing on  phoneme  recognition  for  a low  band  width  speech  cotanunicatlon 
system,  a computer-controlled  manipulator,  and  Intelligent  data  base 
management  system,  as  well  as  automatic  clustering  and  Its  application 
to  emitter  Identification.  In  this  paper,  we  shall  focus  on  the  auto- 
matic clustering  work.  The  goal  of  clustering  [1,2, 3, 5, 6)  Is  the  par- 
t toning  of  a given  set  of  objects  into  subsets  called  clusters  In  such 
a way  that  the  objects  In  a cluster  are  similar  to  one  another  and  that 
objects  In  different  clusters  are  dissimilar.  Two  objects  may  be  con- 
sidered similar  If,  for  example,  the  euclidean  distance  between  them  in 
the  measurement  .feature)  space  Is  small. 

Clustering  has  two  main  purposes.  First,  clustering  may  help  in 
getting  a more  or  less  direct  understanding  of  the  relationships  among 
the  objects.  Second,  clustering  may  be  useful  as  a first  step  in 
pattern  recognition.  In  pattern  recognition  (unlike  clustering),  each 
presented  object  oust  be  labeled  with  Its  class  membership.  After 
clustering  (with  labels  ignored),  one  can  "look  at"  the  data  to  estimate 
whether  pattern  recognition  will  be  easy  or  difficult,  whether  a giv'.n 
class  should  be  combined  with  another  class  or  divided  Into  two  or  more 
classes,  etc. 

Since  clustering  Is  quite  general,  It  has  many  applications.  Later 
we  shell  see  that  clustering  can  be  applied  to  emitter  Identification. 

It  can  be  applied  to  automatic  phoneme  recognition  and  personnel  classi- 
fication. It  Is  often  very  Important  to  find  a cluster  having  only  one 
member.  Such  a cluster  may  be  a mistake  In  the  data  base  or  represent 
a very  unusual  object.  An  unusual  use  of  a computer  system  may  be  an 
attempted  security  penetration. 

2.  CLUSTERING  THE  PUTTER  TDINTIFICATION  DATA 


Each  row  In  Table  1 Is  a sample  (object)  representing  18  measure- 
ments of  an  emission.  Our  sponsor  gave  us  18  samples.  Without  knowing 
anything  else  about  the  data,  we  applied  several  clustering  techniques, 
some  of  which  we  had  developed  [2,5,6],  We  obtained  the  following  four 
clusters : 


(l) 

1. 

2 

(2) 

3, 

5.  6,  7,  9, 

10, 

.11 

(3) 

4, 

17,  18 

(4) 

8, 

12,  13,  14, 

15, 

16 

This  means  that  samples  (rows)  1 and  2 are  In  cluster  (1),  etc. 
Note:  Manuscript  submitted  October  29,  1976. 
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Our  sponsor  Chen  Cold  us  chac  Che  "real"  cluscers  should  be  as  follows; 

(V)  5,  6,  10,  11 

(2’)  4,  17,  18 

O')  3,  7,  9 

(4')  1,  2,  8,  12,  13,  14,  15,  16 

Thus,  che  resulcs  were  qul.ee  good.  Careful  analysis  of  Che  daca 
revealed  Che  following: 

(a)  If  cluscer  (1)  has  Co  be  combined  wlch  any  ocher  cluster, 
1C  should  be  combined  wlch  cluscer  (4).  However,  cluster 
(1)  Is  definitely  different  from  cluster  (4). 

(b)  From  only  the  given  data,  one  really  cannot  say  that 
cluscer  (1')  and  cluster  (3*)  should  be  two  clusters 
racher  chan  being  combined  into  one  cluscer. 

Our  sponsor  agreed  with  boch  of  these  statements. 

We  were  chen  supplied  wlch  12  new  samples  (rows)  with  some  missing 
values.  (See  Table  2).  After  using  various  clustering  methods,  we  con- 
eluded  that  A,  B,  C,  D,  E,  and  F belong  to  cluscer  (1*),  that  G,  H,  and 
I belong  to  (2'),  and  chat  J,  K,  and  L belong  Co  either  (1')  or  (3'). 
Then  our  sponsor  told  us  chat  the  firsc  two  conclusions  are  absolutely 
correcC.  He  said  that  samples  J,  K,  and  L belong  to  cluster  (31).  We 
could  say  only  that  they  belong  to  either  (1*)  or  (3'),  because  (1') 
and  (3')  are  so  close  to  each  ocher. 

3.  TWO  CLUSTERING  METHODS 


Two  of  the  six  clustering  methods  we  used  are  data  reorganization 
(5 1 and  two  principal  components  [1,3],  The  other  four  are  minimal 
spanning  trees  [1,3],  non-linear  mapping  f6],  and  two  versions  of  a 
triangulation  method  [2],  We  do  not  describe  these  here,  since  they 
are  fairly  complicated  to  explain  and  are  described  elsewhere.  In  data 
reorganization,  rows  (and  sometimes  columns  as  here)  are  permuted  to 
put  similar  rows  (aud  columns)  together.  (Two  n-tuples  are  similar  to 
Che  extent  that  the  n-dimenslonal  distance  beeween  them  is  small.)  The 
result  for  the  18  samples  is  shown  in  Table  3.  After  this  reorganiza- 
tion was  done,  the  sponsor  told  us  the  classes.  Table  3 is  perfect  in 
the  sense  that  it  could  be  broken  between  rows  into  four  parts,  each 
exactly  corresponding  to  a given  class.  We  also  used  the  reorganiza- 
tion by  rows  for  automatically  obtaining  the  clustering  hierarchy  shown 
in  Fig.  1.  Table  3 is  broken  into  two  tables  by  dividing  between  the 
rows  whose  distance  apart  is  largest.  These  cabli'j  are  divided  further; 
etc.  Partly  on  this  hierarchy,  we  obtained  rhe  clusters  (1)  through 
(4)  given  earlier. 


3 


Table  3.  Data  Reorganized!  by  Row  and  Column 
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In  the  method  of  two  principal  components,  the  data  is  projected 
from  an  18  dimensional  space  onto  the  "best"  plane  (two  dimensional 
space).  The  result  for  our  data  Is  shown  in  Fig.  2.  Each  of  the  two 
axes  is  a linear  combination  of  the  original  18  features.  Again  wt  see 
that  the  results  are  good.  Partly  based  on  Fig.  2,  we  obtained  clusters 
(1)  through  (4)  given  earlier. 

4.  FUTURE  WORK 


Our  sponsor  will  give  us  a much  larger  set  of  data,  and  we  are 
looking  forward  Co  analyzing  it.  We  hope  that  the  results  will  continue 
Co  be  favorable.  We  shall  also  be  applying  clustering  techniques  to 
automatic  phoneme  recognition  and  intelligent  data  base  management 
systems. 
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